Skip to main content

S3

Query

The Data Service is configured by the Qarbine administrator with an optional default bucket. This is set in the general purpose “Database” field. That value, or one chosen from a “Database” dropdown, provides context on which S3 bucket is being accessed.

Qarbine uses the generic term “collections” which maps to S3 “objects”. When listing these items in drop downs the following rules apply.

  • Any '/' characters for folder delimiters get replaced with underscores because SQL does not like '/'.
  • SQL also does not like '.' in the table name. Any file extension is removed and intermediate folder dots are converted to '_' characters. The object name 'data/foo/animals.json' shows up as 'data_foo_animals'. As a result a bucket with foo.csv and foo.json will list “foo” twice.

SQL Oriented Queries

Qarbine provides a SQL oriented querying interface to the S3 objects which are JSON, CSV, TSV, or XLSX based. The object name is used as the SQL “table” name. Note that in all cases the complete object is read and then processed. For large files a more optimized interaction may be preferred.

The JSON content has to be an array of objects. Here is a sample query,

select * from portfolio.json

The result is shown below.

  

Here is a query which sorts the answer set.

select * from portfolio.json order by cash asc

The result is shown below.

  

Here is a sample query with a WHERE clause.

select * from portfolio.json where isIRA = false

The result is shown below.

  

Table names ending with “.csv” are assumed to be comma separated value (CSV) files. Those ending in “.tsv” are tab separated value (TSV) files. The first row contains the field names.

The following column names may not be referenced: ‘target’ and ‘value’.

Virtual Queries

Request Description
readTextFile OBJECTRead the contents of the references object. The result is an array with a single element. The element has a field ‘content’ with the text.
readJsonFile OBJECTRead the contents of the references object and convert the text into a JSON object. If the JSON object is an array then it is left as-is. Otherwise the result is an array with a single element. The element has a field ‘content’ with the JSON object.
readBinaryFile OBJECTRead the contents of the references object. The result is an array with a single element. The element has a field ‘content’ with the binary value.
list bucketsList the buckets accessible using the active data service.
list tablesList the objects in the bucket. This query uses the Qarbine general purpose “tables” term.

Troubleshooting

When querying JSON or CSV files, you can always view the contents using the “readTextFile XXX” query. Be aware the complete file is retrieved so it may be best to avoid this with really large files.